AITopics | deterministic system

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Agnostic Q -learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Neural Information Processing SystemsDec-24-2025, 22:32:16 GMT

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$. We propose a novel recursion-based algorithm and show that if $\delta = O\left(\rho/\sqrt{\dim_E}\right)$, then one can find the optimal policy using $O(\dim_E)$ trajectories, where $\rho$ is the gap between the optimal $Q$-value of the best actions and that of the second-best actions and $\dim_E$ is the Eluder dimension of $\mathcal{F}$. Our result has two implications: \begin{enumerate} \item In conjunction with the lower bound in [Du et al., 2020], our upper bound suggests that the condition $\delta = \widetilde{\Theta}\left(\rho/\sqrt{\dim_E}\right)$ is necessary and sufficient for algorithms with polynomial sample complexity.

deterministic system, function approximation, near-optimal bound, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

Efficient Exploration and Value Function Generalization in Deterministic Systems

Neural Information Processing SystemsSep-30-2025, 12:42:34 GMT

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function lies within the given hypothesis class, OCP selects optimal actions over all but at most K episodes, where K is the eluder dimension of the given hypothesis class. We establish further efficiency and asymptotic performance guarantees that apply even if the true value function does not lie in the given hypothesis space, for the special case where the hypothesis space is the span of pre-specified indicator functions over disjoint sets.

efficient exploration, exploration and value function generalization, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

We thank all reviewers for their constructive comments and are glad that our contributions are largely recognized

Neural Information Processing SystemsAug-20-2025, 08:25:01 GMT

We thank all reviewers for their constructive comments and are glad that our contributions are largely recognized. Below, we address the reviewer's concerns point by point. A, we provide results of three MuJoCo manipulation examples: Pusher, Striker and Thrower . GAIL and GAIfO, our method is able to outperform all other LfO baselines. We thank the reviewer for the reminding.

deepmimic, gail, reviewer, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.31)

Add feedback